A Knowledge-Modeling Approach for Multilingual Regulus Lexica
نویسندگان
چکیده
Development of lexical resources is, along with grammar development, one of the main efforts when building multilingual NLP applications. In this paper, we present a tool-based approach for more efficient manual lexicon development for a spoken language translation system. The approach in particular addresses the common problems of multilingual lexica including the redundancy of encoded information and inconsistency of lexica of different languages. The general benefits of this practical tool-based approach are clear and user-friendly lexicon structure, inheritance of information inside of a language and between different system languages, and transparency and consistency of coverage between system languages. The visual tool-based approach is user-friendly to linguistic informants that don’t have previous experience of lexicon development, while at the same time, it still is a powerful tool for expert system developers.
منابع مشابه
Towards multilingual interoperability in automatic speech recognition
In this communication, we address multilingual interoperability aspects in speech recognition. After giving a tentative definition of multilingual interoperability, we discuss speech recognition components and their language-specific aspects. We give a sample overview of past multilingual speech recognition research and development across different speaking styles (read, prepared and conversati...
متن کاملHistorical-Comparative Reconstruction and Multilingual Lexica
This paper argues for the use of formal methods from historicalcomparative reconstruction in the design of synchronic representations for multilingual lexica of genetically closely related languages. A model is discussed before an extended example with Slavic languages is given together with an implementation in DATR.
متن کاملAutomatic Verification and Augmentation of Multilingual Lexicons
We present an approach for automatic verification and augmentation of multilingual lexica. We exploit existing parallel and monolingual corpora to extract multilingual correspondents via triangulation. We demonstrate the efficacy of our approach on two publicly available resources: Tharwa, a three-way lexicon comprising Dialectal Arabic, Modern Standard Arabic and English lemmas among other inf...
متن کاملxLiD-Lexica: Cross-lingual Linked Data Lexica
In this paper, we introduce our cross-lingual linked data lexica, called xLiD-Lexica, which are constructed by exploiting the multilingual Wikipedia and linked data resources from Linked Open Data (LOD). We provide the cross-lingual groundings of linked data resources from LOD as RDF data, which can be easily integrated into the LOD data sources. In addition, we build a SPARQL endpoint over our...
متن کاملGeneration of multilingual ontology lexica with M-ATOLL: a corpus-based approach for the induction of ontology lexica
While there are many large knowledge bases (e.g. Freebase, Yago, DBpedia) as well as linked data sets available on the web, they typically lack lexical information stating how the properties and classes are realized lexically. If at all, typically only one label is attached to these properties, thus lacking any deeper syntactic information, e.g. about syntactic arguments and how these map to th...
متن کامل